Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[57932] Make Mails::MailerJob retry failed emails #16871

Merged

Conversation

cbliard
Copy link
Member

@cbliard cbliard commented Oct 1, 2024

Ticket

https://community.openproject.org/wp/57932

What are you trying to accomplish?

Make OpenProject retry sending emails when sending fails for a temporary reason (short outage of provider for instance).

What approach did you choose and why?

Mailer jobs were discarded instead of being retried. Here is why:

Previously the retry_on StandardError defined in the Mails::MailerJob class was ignored because there is also a rescue_from StandardError declared after it. As retry_on is implemented using rescue_from, and the handlers are evaluated in reverse order, the last declared rescue_from would be picked up and the retry logic would not be triggered. That's why mail jobs were discarded instead of being retried.

Here is how it was fixed:

The issue was fixed by inheriting directly from ActionMailer::MailDeliveryJob instead of redefining its methods. This
way the rescue_from is declared first in the parent class, and then retry_on is declared second in the child class, meaning it will take precedence (because declared last) and be picked up.

The needed shared logic from ApplicationJob is extracted to a new
SharedJobSetup module and included in both ApplicationJob and
Mails::MailerJob.

Merge checklist

  • Added/updated tests
  • Added/updated documentation in Lookbook (patterns, previews, etc)
  • Tested major browsers (Chrome, Firefox, Edge, ...)

We want `MailerJob` to behave like both our `ApplicationJob` (for
reloading the mailer configuration and resetting the request store
automatically) and `ActionMailer::MailDeliveryJob` (for the basic
functionalities of sending emails).

As we can only inherit from one of these two classes, we inherit from
`ApplicationJob` and copy the relevant parts from
`ActionMailer::MailDeliveryJob`.

This copy was done some time ago, and the `ActionMailer::MailDeliveryJob`
class has evolved since then. This commit updates our copy to match the
current `ActionMailer::MailDeliveryJob` class.
https://community.openproject.org/wp/57932

Retrying 14 times with polynomial backoff means it will retry for
roughly 1.5 days.

Previously the `retry_on StandardError` defined in the
`Mails::MailerJob` class was ignored because there is also a
`rescue_from StandardError` declared after it. As `retry_on` is
implemented using `rescue_from`, and the handlers are evaluated in
reverse order, the last declared `rescue_from` would be picked up and
the retry logic would not be triggered. That's why mail jobs were
discarded instead of being retried.

This commit fixes the issue by inheriting directly from
`ActionMailer::MailDeliveryJob` instead of redefining its methods. This
was the `rescue_from` is declared first in the parent class, and then
`retry_on` is declared second in the child class, meaning it will take
precedence and be picked up.

The needed shared logic from `ApplicationJob` is extracted to a new
`SharedJobSetup` module and included in both `ApplicationJob` and
`Mails::MailerJob`.
@cbliard cbliard marked this pull request as ready for review October 2, 2024 10:25
@cbliard cbliard requested a review from machisuji October 2, 2024 10:29
Copy link
Member

@machisuji machisuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for fixing that, @cbliard ! While it doesn't make sense to test rails ActiveJob logic, we have clearly been using it wrong in the past and didn't notice that it didn't work as intended. This is why I think this justifies a spec to test that we are using it correctly.

Could you please add one?

@cbliard
Copy link
Member Author

cbliard commented Oct 2, 2024

Sure, I can try adding one. I'm curious to see what I will end up with.

@cbliard
Copy link
Member Author

cbliard commented Oct 2, 2024

@machisuji I added a test for it. Can you review it again?

Also I'm concerned that it will be a bit harder to find errors with unit tests: if a mailer produces a StandardError for some reason, then the exception will be catched and the mailing job reenqueued for a retry. Previously it would have raised the error so it was easy to spot.

You can easily see it if you comment the line retry_on ... in the Mails::MailerJob and run the spec: it displays the "ArgumentError" error and the test fails. For that test it's ok because that's what we test, but if we introduce a bug in a mailer, will our test suite still catch it?

@cbliard cbliard requested a review from machisuji October 2, 2024 14:11
@machisuji
Copy link
Member

machisuji commented Oct 2, 2024

@machisuji I added a test for it. Can you review it again?

Also I'm concerned that it will be a bit harder to find errors with unit tests: if a mailer produces a StandardError for some reason, then the exception will be catched and the mailing job reenqueued for a retry. Previously it would have raised the error so it was easy to spot.

You can easily see it if you comment the line retry_on ... in the Mails::MailerJob and run the spec: it displays the "ArgumentError" error and the test fails. For that test it's ok because that's what we test, but if we introduce a bug in a mailer, will our test suite still catch it?

Great! Thanks for adding the test!

As for your concern, I suppose it does obscure the failure for a test in case we do break a mailer somewhere. But usually it should still be found because we would probably test that a mail did go out and contained certain contents?

I suppose you could amend the spec helper with a retry_stopped (ActiveSupport::Notifications.subscribe "retry_stopped.active_job" { } (see active support instrumentation) subscription and raise the error there in all specs except for this retry spec?

Copy link
Member

@machisuji machisuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@cbliard
Copy link
Member Author

cbliard commented Oct 2, 2024

I suppose you could amend the spec helper with a retry_stopped (ActiveSupport::Notifications.subscribe "retry_stopped.active_job" { } (see active support instrumentation) subscription and raise the error there in all specs except for this retry spec?

I get the idea but that would not work: retry_stopped notification happens when the number of attempts has been reached, right before the job is discarded and the error is raised.

Well, the jobs are not performed that often. It should not be too much of a problem.

@cbliard cbliard merged commit c84bd56 into release/14.6 Oct 2, 2024
11 checks passed
@cbliard cbliard deleted the bugfix/57932-make-mailerjob-retry-failed-emails branch October 2, 2024 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants